{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "365f05f9-f3c6-4b67-b494-47d241dd7950",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Host an object detection model as a service\n",
    "\n",
    "Ray Serve is a scalable model-serving framework that allows deploying machine learning models as microservices. This tutorial uses Ray Serve to deploy an object detection model using Faster R-CNN. The model detects whether a person is wearing a mask correctly, incorrectly, or not at all."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d5eb518-8e35-45ad-a0fd-194094080bbe",
   "metadata": {
    "tags": []
   },
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "  <b>Anyscale-specific configuration</b>\n",
    "  \n",
    "  <p>Note: This tutorial is optimized for the Anyscale platform. When running on open source Ray, additional configuration is required. For example, you need to manually:</p>\n",
    "  \n",
    "  <ul>\n",
    "    <li>\n",
    "      <b>Configure your Ray Cluster:</b> Set up your multi-node environment, including head and worker nodes, and manage resource allocation like autoscaling and GPU/CPU assignments, without the Anyscale automation. See <a href=\"https://docs.ray.io/en/latest/cluster/getting-started.html\">Ray Clusters</a> for details.\n",
    "    </li>\n",
    "    <li>\n",
    "      <b>Manage dependencies:</b> Install and manage dependencies on each node because you won’t have Anyscale’s Docker-based dependency management. See <a href=\"https://docs.ray.io/en/latest/ray-core/handling-dependencies.html\">Environment Dependencies</a> for instructions on installing and updating Ray in your environment.\n",
    "    </li>\n",
    "    <li>\n",
    "      <b>Set up storage:</b> Configure your own distributed or shared storage system instead of relying on Anyscale’s integrated cluster storage. See <a href=\"https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html\">Configuring Persistent Storage</a> for suggestions on setting up shared storage solutions.\n",
    "    </li>\n",
    "  </ul>\n",
    "\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33e35acd-101f-40e5-9f6d-8569ccc9f384",
   "metadata": {},
   "source": [
    "\n",
    "## Why use Ray Serve and Anyscale\n",
    "\n",
    "### Scalability and performance\n",
    "\n",
    "- **Automatic scaling**: Ray Serve scales horizontally, which means your deployment can handle a growing number of requests by distributing the load across multiple machines and GPUs. This feature is particularly useful for production environments where traffic can be unpredictable.\n",
    "- **Efficient resource utilization**: With features like fractional GPU allocation and dynamic scheduling, Ray Serve uses resources efficiently, resulting in lower operational costs while maintaining high throughput for model inferences.\n",
    "\n",
    "### Framework-agnostic model serving\n",
    "\n",
    "- **Broad compatibility**: Whether you’re using deep learning frameworks like PyTorch, TensorFlow, or Keras, or even traditional libraries such as Scikit-Learn, Ray Serve offers a unified platform to deploy these models.\n",
    "- **Flexible API development**: Beyond serving models, you can integrate any Python business logic. This capability makes composing multiple models and integrating additional services into a single inference pipeline easier.\n",
    "\n",
    "### Advanced features for modern applications\n",
    "\n",
    "- **Dynamic request batching**: This feature allows multiple small inference requests to be batched together, reducing the per-request overhead and increasing overall efficiency.\n",
    "- **Response streaming**: For apps that need to return large outputs or stream data in real-time, response streaming can improve user experience and reduce latency.\n",
    "- **Model composition**: You can build complex, multi-step inference pipelines that integrate various models, allowing you to construct end-to-end services that combine machine learning and custom business logic.\n",
    "\n",
    "Building on Ray Serve, Anyscale Service elevates this deployment by offering a fully managed platform that streamlines infrastructure management. It automatically scales resources, integrates seamlessly with cloud services, and provides robust monitoring and security features. Together, Ray Serve and Anyscale Service enable you to deploy the mask detection model as a scalable, efficient, and reliable microservice in a production environment, effectively abstracting operational complexities while ensuring optimal performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbf71ccc-02f7-4c1b-8724-99b2c89dbf75",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Inspect `object_detection.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e02f435-0bdf-446d-ba5e-b2652ce5528b",
   "metadata": {
    "tags": []
   },
   "source": [
    "To start, inspect the file `object_detection.py`. This module implements a Ray Serve deployment for an object detection service using FastAPI.\n",
    "\n",
    "The code initializes a FastAPI app and uses Ray Serve to deploy two classes, one for handling HTTP requests (`APIIngress`) and one for performing object detection (`ObjectDetection`). This separation of concerns—APIIngress for HTTP interfacing and ObjectDetection for image processing—allows for scalable, efficient handling of requests, with Ray Serve managing resource allocation and replicas.\n",
    "\n",
    "**The `APIIngress` class** serves as the entry point for HTTP requests using FastAPI, exposing an endpoint (\"`/detect`\") that accepts image URLs and returns processed images. When a request hits this endpoint, `APIIngress` asynchronously delegates the task to the `ObjectDetection` service by calling its detect method. \n",
    "\n",
    "Following is the explanation of the decorators for `APIIngress` class:\n",
    "\n",
    "* `@serve.deployment(num_replicas=1)`: This decorator indicates that the ingress service, which primarily routes HTTP requests using FastAPI, runs as a single instance. For this example, it mainly acts as a lightweight router to forward requests to the actual detection service. A single replica is typically sufficient. To handle high traffic volume in production, increase this number. \n",
    "* `@serve.ingress(app)`: This decorator integrates the FastAPI app with Ray Serve. It makes the API endpoints defined in the FastAPI app accessible through the deployment. Essentially, it enables serving HTTP traffic directly through this deployment.\n",
    "\n",
    "\n",
    "**The `ObjectDetection` class** handles the core functionality: it loads a pre-trained Faster R-CNN model, processes incoming images, runs object detection to identify mask-wearing statuses, and visually annotates the images with bounding boxes and labels. \n",
    "\n",
    "Following is the explanation of the decorators  for `ObjectDetection` class:\n",
    "\n",
    "* `ray_actor_options={\"num_gpus\": 1}`: This configuration assigns one GPU to each replica of the ObjectDetection service. Given that the service loads a deep learning model (Faster R-CNN) for mask detection, having GPU resources is essential for accelerating inference. This parameter makes sense if your infrastructure has GPU resources available and you want each actor to leverage hardware acceleration.\n",
    "* `autoscaling_config={\"min_replicas\": 1, \"max_replicas\": 10}`:  `min_replicas: 1` ensures that at least one replica is always running, providing baseline availability. `max_replicas: 10` limits the maximum number of replicas to 10, which helps control resource usage while accommodating potential spikes in traffic.\n",
    "\n",
    "Then, `bind` the deployment with optional arguments to the constructor to define an app. Finally, deploy the resulting app using `serve.run` (or the equivalent `serve run` CLI command).\n",
    "\n",
    "For more details, see: https://docs.ray.io/en/latest/serve/configure-serve-deployment.html\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8704b14e-f992-4f81-8c3e-11d676a9f28e",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Run the object detection service with Ray Serve"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfbfaa90-d4a2-45e1-bf9b-6b64069866bf",
   "metadata": {
    "tags": []
   },
   "source": [
    "To launch the object detection service, launch the terminal from an Anyscale workspace and use the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7931e2c4-6436-4df4-8c7e-240d38ef0af7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "! serve run object_detection:entrypoint --non-blocking"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1140b34e-9b22-4c25-ab65-f589975c6c8b",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Send a request to the service\n",
    "\n",
    "To test the deployed model, send an HTTP request to the service using Python. The following code fetches an image, sends it to the detection service, and displays the output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2224521-2fc3-413e-9370-4ab6936b59eb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "from PIL import Image\n",
    "from io import BytesIO\n",
    "from IPython.display import display\n",
    "\n",
    "image_url = \"https://face-masks-data.s3.us-east-2.amazonaws.com/all/images/maksssksksss5.png\"\n",
    "resp = requests.get(f\"http://127.0.0.1:8000/detect?image_url={image_url}\")\n",
    "\n",
    "# Display the image\n",
    "image = Image.open(BytesIO(resp.content))\n",
    "display(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a306c0-c8a2-4738-9616-62880effa1f1",
   "metadata": {},
   "source": [
    "## Shut down the service\n",
    "\n",
    "Use the following command to shut down the service:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "622f18c9-0826-4fdf-9833-8f067b45f527",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!serve shutdown --yes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f60234d0-e408-478d-af3e-e3bb5b67d3bd",
   "metadata": {},
   "source": [
    "## Production deployment\n",
    "For production deployment, use Anyscale Services to deploy the Ray Serve application to a dedicated cluster without modifying the code. Anyscale ensures scalability, fault tolerance, and load balancing, keeping the service resilient against node failures, high traffic, and rolling updates.\n",
    "\n",
    "### Deploy as an Anyscale Service\n",
    "Use the following to deploy `my_service` in a single command:\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "187a2297",
   "metadata": {},
   "outputs": [],
   "source": [
    "!anyscale service deploy object_detection:entrypoint --name=face_mask_detection_service"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73f7dca2",
   "metadata": {},
   "source": [
    "## Check the status of the service\n",
    "To get the status of `my_service`, run the following:\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20296bdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "!anyscale service status --name=face_mask_detection_service"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4daea9c-f9c8-45c8-97d6-61a9aee251fc",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Query the service\n",
    "\n",
    "When you deploy, you expose the service to a publicly accessible IP address, which you can send requests to.\n",
    "\n",
    "In the preceding cell’s output, copy the API_KEY and BASE_URL. As an example, the values look like the following:\n",
    "\n",
    "* API_KEY: `xkRQv_4MENV7iq34gUprbQrX3NUqpk6Bv6UQpiq6Cbc`\n",
    "\n",
    "* BASE_URL: https://face-mask-detection-service-bxauk.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com\n",
    "\n",
    "\n",
    "Fill in the following placeholder values for the BASE_URL and API_KEY in the following Python requests object:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42800e30-0061-4b05-a77a-f95227d3b88d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "API_KEY = \"xkRQv_4MENV7iq34gUprbQrX3NUqpk6Bv6UQpiq6Cbc\"  # PASTE HERE\n",
    "BASE_URL = \"https://face-mask-detection-service-bxauk.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com\"  # PASTE HERE, remove the slash as the last character.\n",
    "\n",
    "def detect_masks(image_url: str):\n",
    "    response: requests.Response = requests.get(\n",
    "        f\"{BASE_URL}/detect\",\n",
    "        params={\"image_url\": image_url},\n",
    "        headers={\n",
    "            \"Authorization\": f\"Bearer {API_KEY}\",\n",
    "        },\n",
    "    )\n",
    "    response.raise_for_status()\n",
    "    return response  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a19789b-7773-495f-b602-ffe1c4929071",
   "metadata": {
    "tags": []
   },
   "source": [
    "Then you can call the service API and obtain the detection results:\n",
    "\n",
    "```python\n",
    "from PIL import Image\n",
    "from io import BytesIO\n",
    "from IPython.display import display\n",
    "\n",
    "image_url = \"https://face-masks-data.s3.us-east-2.amazonaws.com/all/images/maksssksksss5.png\"\n",
    "resp = detect_masks(image_url)\n",
    "# Display the image.\n",
    "image = Image.open(BytesIO(resp.content))\n",
    "display(image)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af4f69b8-f81e-4f74-827b-7251d4a06332",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Advanced configurations\n",
    "\n",
    "For production environments, Anyscale recommends using a `Serve config YAML` file, which provides a centralized way to manage system-level settings and application-specific configurations. This approach enables seamless updates and scaling of your deployments by modifying the config file and applying changes without service interruptions. For a comprehensive guide on configuring Ray Serve deployments, see the official documentation: https://docs.ray.io/en/latest/serve/configure-serve-deployment.html"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a069dbb9-6651-42f8-8ba2-21873bdd6537",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Terminate your service\n",
    "\n",
    "Remember to terminate your service after testing, otherwise it keeps running:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c07702b",
   "metadata": {},
   "source": [
    "anyscale service terminate --name=face_mask_detection_service"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb5ab08d-2630-4048-8d35-4a4e46aaa992",
   "metadata": {},
   "source": [
    "## Clean up the cluster storage\n",
    "\n",
    "You can see what files are stored in the `cluster_storage`. You can see the file `fasterrcnn_model_mask_detection.pth` that you created for fast model loading and serving. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2432b0f5",
   "metadata": {},
   "source": [
    "ls -lah /mnt/cluster_storage/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc0910ed",
   "metadata": {},
   "source": [
    "**Remember to cleanup the cluster storage by removing it:**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f83afa59",
   "metadata": {},
   "source": [
    "rm -rf /mnt/cluster_storage/fasterrcnn_model_mask_detection.pth"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
